Suggestion of new function: `describe_missing()` #561

rempsyc · 2024-11-11T11:03:12Z

Fixes #454

etiennebacher

Thank you, I think it would be good to have describe_missing() but the way it is implemented and documented looks very field-specific to me. I find the output of skimr::skim() easier to understand with n_missing and complete_rate for instance. I'm also not familiar at all with aggregating stats on missing values across several variables (e.g. Ozone:Wind) and the default output looks unexpected to me (I'd rather expect one row per variable).

etiennebacher · 2024-11-12T14:19:02Z

R/describe_missing.R

+#' @description Provides a detailed description of missing values in a data frame.
+#' This function reports both absolute and percentage missing values of specified
+#' column lists or scales, following recommended guidelines. Some authors recommend
+#' reporting item-level missingness per scale, as well as a participant's maximum
+#' number of missing items by scale. For example, Parent (2013) writes:
+#'
+#' *I recommend that authors (a) state their tolerance level for missing data by scale
+#' or subscale (e.g., "We calculated means for all subscales on which participants gave
+#' at least 75% complete data") and then (b) report the individual missingness rates
+#' by scale per data point (i.e., the number of missing values out of all data points
+#' on that scale for all participants) and the maximum by participant (e.g., "For Attachment
+#' Anxiety, a total of 4 missing data points out of 100 were observed, with no participant
+#' missing more than a single data point").*


This sounds a bit too much focused on survey data while this function can be interesting for all kinds of data. I'd rather keep the first or two first sentences here and move the rest in a specific section in 'Details' (but even there, this seems very field-specific).

etiennebacher · 2024-11-12T14:23:25Z

R/describe_missing.R

+#' missing more than a single data point").*
+#'
+#' @param data The data frame to be analyzed.
+#' @param vars Variable (or lists of variables) to check for missing values (NAs).


We use select, exclude, etc. in all other dataframe functions, I think we should here as well.

etiennebacher · 2024-11-12T14:26:12Z

R/describe_missing.R

+#'
+#' @param data The data frame to be analyzed.
+#' @param vars Variable (or lists of variables) to check for missing values (NAs).
+#' @param scales The scale names to check for missing values (as a character vector).


I find this description of scales unclear, can you detail a bit more?

etiennebacher · 2024-11-12T14:27:35Z

R/describe_missing.R

+#' @param data The data frame to be analyzed.
+#' @param vars Variable (or lists of variables) to check for missing values (NAs).
+#' @param scales The scale names to check for missing values (as a character vector).
+#' @keywords missing values NA guidelines


I never really understood the point of @keywords (apart from @keywords internal), where do they appear in the docs?

etiennebacher · 2024-11-12T14:28:31Z

R/describe_missing.R

+#' @keywords missing values NA guidelines
+#' @return A dataframe with the following columns:
+#'  - `var`: Variables selected.
+#'  - `items`: Number of items for selected variables.


I think unique_values instead of items would be clearer.

etiennebacher · 2024-11-12T14:29:26Z

R/describe_missing.R

+#'  - `na`: Number of missing cell values for those variables (e.g., 2 missing
+#'  values for the first participant + 2 missing values for the second participant
+#'  = total of 4 missing values).


This sounds again very field-specific, I think we could keep it simple:

Suggested change

#' - `na`: Number of missing cell values for those variables (e.g., 2 missing

#' values for the first participant + 2 missing values for the second participant

#' = total of 4 missing values).

#' - `na`: Number of missing values for those variables.

etiennebacher · 2024-11-12T14:33:59Z

R/describe_missing.R

+#' # One can list the scale names directly:
+#' describe_missing(df, scales = c("ID", "open", "extrovert", "agreeable"))
+describe_missing <- function(data, vars = NULL, scales = NULL) {
+  classes <- lapply(data, class)


This is never used.

Suggestion of new function: describe_missing()

f879900

Fixes #454

rempsyc marked this pull request as draft November 11, 2024 11:31

rempsyc added 3 commits November 11, 2024 21:25

Suggestion of new function: describe_missing()

ab9f006

Fixes #454

styler, update dic

218b7f4

Suggestion of new function: describe_missing()

ebaeb68

Fixes #454

rempsyc marked this pull request as ready for review November 11, 2024 21:19

rempsyc requested a review from etiennebacher November 11, 2024 21:19

news.md

c3c1302

etiennebacher requested changes Nov 12, 2024

View reviewed changes

Merge branch 'main' into rempsyc/issue454

357dbbc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion of new function: `describe_missing()` #561

Suggestion of new function: `describe_missing()` #561

rempsyc commented Nov 11, 2024

etiennebacher left a comment •

edited

Loading

etiennebacher Nov 12, 2024

etiennebacher Nov 12, 2024

etiennebacher Nov 12, 2024 •

edited

Loading

etiennebacher Nov 12, 2024

etiennebacher Nov 12, 2024

etiennebacher Nov 12, 2024

etiennebacher Nov 12, 2024

Suggestion of new function: describe_missing() #561

Are you sure you want to change the base?

Suggestion of new function: describe_missing() #561

Conversation

rempsyc commented Nov 11, 2024

etiennebacher left a comment • edited Loading

Choose a reason for hiding this comment

etiennebacher Nov 12, 2024

Choose a reason for hiding this comment

etiennebacher Nov 12, 2024

Choose a reason for hiding this comment

etiennebacher Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

etiennebacher Nov 12, 2024

Choose a reason for hiding this comment

etiennebacher Nov 12, 2024

Choose a reason for hiding this comment

etiennebacher Nov 12, 2024

Choose a reason for hiding this comment

etiennebacher Nov 12, 2024

Choose a reason for hiding this comment

Suggestion of new function: `describe_missing()` #561

Suggestion of new function: `describe_missing()` #561

etiennebacher left a comment •

edited

Loading

etiennebacher Nov 12, 2024 •

edited

Loading